-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve quantile performance v2 #91
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #91 +/- ##
==========================================
+ Coverage 96.89% 96.91% +0.02%
==========================================
Files 1 1
Lines 419 422 +3
==========================================
+ Hits 406 409 +3
Misses 13 13
Continue to review full report at Codecov.
|
The only drawback of this approach is the case when very many quantiles are requested as we sort |
Thanks. Is there any reason to think that a series of partial sorts of nested subsets of the data would be significantly slower than a single full sort? Have you tried benchmarking this? |
We have to sort |
Here are the benchmarks:
so as you can see it starts to deteriorate much faster. (as usual - it would not hurt if you double checked this if you had time as I might have made some error here) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I had two comments pending...
sort!(v, 1, lv, Base.Sort.PartialQuickSort(lo:hi), Base.Sort.Forward) | ||
start = 1 | ||
for pv in sort(p) | ||
lv = length(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this out of the loop? BTW, better use lastindex
even if we call require_one_based_indexing(v)
.
lo = floor(Int,pv*(lv)) | ||
hi = ceil(Int,1+pv*(lv)) | ||
sort!(v, start, lv, Base.Sort.PartialQuickSort(lo:hi), Base.Sort.Forward) | ||
start = hi + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you completely sure of the +1
? Is that still correct if p
contains duplicates? That would be worth testing.
I'm not sure it's worth worrying about performance when the number of quantiles is large compared to the data. Quantiles don't make a lot of sense in that case. Maybe a simple optimization is to do |
This is an alternative implementation to #86.
Here following #86 (comment) I perform partial sorting incrementally.
I make this a separate PR to allow an easy comparison of both. Either one or the other should be merged.